| Supplier | Sub-Districts | 1849 Deaths per 10,000 | 1854 Deaths per 10,000 |
|---|---|---|---|
| Joint Southwark & Vauxhall/Lambeth (Treated) | 16 | 130.1 | 84.9 |
| Southwark & Vauxhall Only (Untreated) | 12 | 134.9 | 146.6 |
2025-06-10
Two (or more) units: some treated/exposed, some untreated
Two time periods: one prior to first treatment, one after
Example: South London “Grand Experiment” from Coleman 2024
Untreated: Southwark & Vauxhall Districts (12)
Treated: Joint Southwark & Vauxhall/Lambeth Districts (16)
Time Periods: 1849 (pre-treatment) and 1854 (post-treatment) outbreaks
| Unit | Pre-Treatment | Post-Treatment |
|---|---|---|
| Exposed | \(Y_{10} = Y_{10}(0)\) | \(Y_{11} = Y_{11}(1)\) |
| Unexposed | \(Y_{00} = Y_{00}(0)\) | \(Y_{01} = Y_{01}(0)\) |
Treatment Effect:
\[ \theta = E[Y_{11}(1) - Y_{11}(0)] \]
Within each unit, we have an interrupted time series:
\[ \begin{aligned} \Delta_1 &= Y_{11} - Y_{10} \\ \Delta_0 &= Y_{01} - Y_{00} \end{aligned} \]
Key Idea
Use the observed \(\Delta_0\) under control as the potential outcome for the unobserved \(\Delta_1\) under treatment.
\[ \begin{aligned} \hat{Y}_{11}(1) &= Y_{11} \\ \hat{Y}_{11}(0) &= Y_{10} + \color{darkgreen}{(Y_{01} - Y_{00})} \\ \hat{\theta} &= \color{purple}{(Y_{11} - Y_{10})} - \color{darkgreen}{(Y_{01} - Y_{00})} \\ \end{aligned} \]
| Supplier | Sub-Districts | 1849 Deaths per 10,000 | 1854 Deaths per 10,000 |
|---|---|---|---|
| Joint Southwark & Vauxhall/Lambeth (Treated) | 16 | 130.1 | 84.9 |
| Southwark & Vauxhall Only (Untreated) | 12 | 134.9 | 146.6 |
| Supplier | 1849 Deaths per 10,000 | 1854 Deaths per 10,000 | Diff, 1854-1849 |
|---|---|---|---|
| Joint Southwark & Vauxhall/Lambeth (Treated) | 130.1 | 84.9 | -45.2 |
| Southwark & Vauxhall Only (Untreated) | 134.9 | 146.6 | 11.8 |
| Diff, Treated-Untreated | -4.8 | -61.8 | -57.0 |
\[ Y_{it} = \alpha_i + \gamma_t + \theta I(X_{it} = 1)+\epsilon_{it}, \]
where:
\(\alpha_i\) is the fixed effect for unit \(i\),
\(\gamma_t\) is the fixed effect for time \(t\),
\(\epsilon_{it}\) is the error term for unit \(i\) in time \(t\), and
\(X_{it}\) is the indicator of whether unit \(i\) is treated at time \(t\).
\(\theta\) is the treatment effect estimand.
Inference can be conducted using the TWFE regression model. This accounts for variability in the outcome if there are multiple treated/untreated units and multiple periods.
Generally, the standard errors are clustered by unit to account for correlation. This can also be done with a block-bootstrap variance estimation.
Caution
This accounts for statistical uncertainty but not causal uncertainty in the model assumptions. Those cannot be fully assessed statistically.
See the analysis/zika-did-handout file for an example analysis, with visualization and regression-based estimation.
Parallel trends (in expectation of potential outcomes)
No spillover
No anticipation/clear time point for treatment
\[ E[\color{purple}{Y_{11}(0) - Y_{10}(0)}] = E[\color{darkgreen}{Y_{01}(0) - Y_{00}(0)}] \]
In the absence of treatment, the treated and untreated units would have the same expected outcome trend over time.
There is no effect of the treatment on any untreated units (similar to a consistency or SUTVA assumption across units).
There is no effect of the treatment (or its announcement) prior to the time period assigned as its start (similar to a consistency or SUTVA assumption across periods). A washout period can be incorporated if necessary.
Placebo/specification tests:
In-time: conduct the same DID analysis on a time period prior to the actual treatment initiation
In-space: conduct the same DID analysis as if an untreated unit were the treated one
Alternative outcome: conduct the same DID analysis on an outcome that should not be affected by the treatment
These approaches can be used either:
as a heuristic justification for the assumption,
to obtain a null distribution for permutation tests, or
to adjust the estimate for the “null” effect (difference-in-difference-in-differences or triple-differences).
Changing the scale of the outcome changes the parallel trends assumption. The most common transformation is to use the natural log.
E.g., \(\log(Y_{it}) = \alpha_i + \gamma_t + \theta I(X_{it}=1) + \epsilon_{it}\)
Changes parallel trends assumption to:
\[ \begin{aligned} E[\color{purple}{\log Y_{11}(0) - \log Y_{10}(0)}] &= E[\color{darkgreen}{\log Y_{01}(0) - \log Y_{00}(0)}] \\ E \left[ \log \left( \color{purple}{\frac{Y_{11}(0)}{Y_{10}(0)}} \right) \right] &= E \left[ \log \left( \color{darkgreen}{\frac{Y_{01}(0)}{Y_{00}(0)}} \right) \right] \end{aligned} \]
Caution
Only one scale can actually have parallel trends
This changes the estimand (e.g., additive -> multiplicative)
See Kahn-Lang and Lang (2020) for more considerations and Feng and Bilinski (2024) for examples of different scales/specifications.
Incorporating covariates makes the parallel trends assumption conditional on those covariates.
E.g., \(Y_{it} = \alpha_i + \gamma_t + \theta I(X_{it}=1) + \beta Z_{i} + \epsilon_{it}\)
Changes parallel trends assumption to:
\[ E[\color{purple}{Y_{11}(0) - Y_{10}(0)} ~ | ~ Z_1] = E[\color{darkgreen}{Y_{01}(0) - Y_{00}(0)} ~ | ~ Z_0] \]
Caution
This makes the parallel trends assumption more complex to consider and requires modeling covariates
This changes the estimand and assumes the effect is homogeneous across covariates
See Caetano and Callaway (2023) for issues that arise with time-varying covariates.